75 research outputs found
A Hierarchical Neural Autoencoder for Paragraphs and Documents
Natural language generation of coherent long texts like paragraphs or longer
documents is a challenging problem for recurrent networks models. In this
paper, we explore an important step toward this generation task: training an
LSTM (Long-short term memory) auto-encoder to preserve and reconstruct
multi-sentence paragraphs. We introduce an LSTM model that hierarchically
builds an embedding for a paragraph from embeddings for sentences and words,
then decodes this embedding to reconstruct the original paragraph. We evaluate
the reconstructed paragraph using standard metrics like ROUGE and Entity Grid,
showing that neural models are able to encode texts in a way that preserve
syntactic, semantic, and discourse coherence. While only a first step toward
generating coherent text units from neural models, our work has the potential
to significantly impact natural language generation and
summarization\footnote{Code for the three models described in this paper can be
found at www.stanford.edu/~jiweil/
Effective Approaches to Attention-based Neural Machine Translation
An attentional mechanism has lately been used to improve neural machine
translation (NMT) by selectively focusing on parts of the source sentence
during translation. However, there has been little work exploring useful
architectures for attention-based NMT. This paper examines two simple and
effective classes of attentional mechanism: a global approach which always
attends to all source words and a local one that only looks at a subset of
source words at a time. We demonstrate the effectiveness of both approaches
over the WMT translation tasks between English and German in both directions.
With local attention, we achieve a significant gain of 5.0 BLEU points over
non-attentional systems which already incorporate known techniques such as
dropout. Our ensemble model using different attention architectures has
established a new state-of-the-art result in the WMT'15 English to German
translation task with 25.9 BLEU points, an improvement of 1.0 BLEU points over
the existing best system backed by NMT and an n-gram reranker.Comment: 11 pages, 7 figures, EMNLP 2015 camera-ready version, more training
detail
When Are Tree Structures Necessary for Deep Learning of Representations?
Recursive neural models, which use syntactic parse trees to recursively
generate representations bottom-up, are a popular architecture. But there have
not been rigorous evaluations showing for exactly which tasks this syntax-based
method is appropriate. In this paper we benchmark {\bf recursive} neural models
against sequential {\bf recurrent} neural models (simple recurrent and LSTM
models), enforcing apples-to-apples comparison as much as possible. We
investigate 4 tasks: (1) sentiment classification at the sentence level and
phrase level; (2) matching questions to answer-phrases; (3) discourse parsing;
(4) semantic relation extraction (e.g., {\em component-whole} between nouns).
Our goal is to understand better when, and why, recursive models can
outperform simpler models. We find that recursive models help mainly on tasks
(like semantic relation extraction) that require associating headwords across a
long distance, particularly on very long sequences. We then introduce a method
for allowing recurrent models to achieve similar performance: breaking long
sentences into clause-like units at punctuation and processing them separately
before combining. Our results thus help understand the limitations of both
classes of models, and suggest directions for improving recurrent models
Addressing the Rare Word Problem in Neural Machine Translation
Neural Machine Translation (NMT) is a new approach to machine translation
that has shown promising results that are comparable to traditional approaches.
A significant weakness in conventional NMT systems is their inability to
correctly translate very rare words: end-to-end NMTs tend to have relatively
small vocabularies with a single unk symbol that represents every possible
out-of-vocabulary (OOV) word. In this paper, we propose and implement an
effective technique to address this problem. We train an NMT system on data
that is augmented by the output of a word alignment algorithm, allowing the NMT
system to emit, for each OOV word in the target sentence, the position of its
corresponding word in the source sentence. This information is later utilized
in a post-processing step that translates every OOV word using a dictionary.
Our experiments on the WMT14 English to French translation task show that this
method provides a substantial improvement of up to 2.8 BLEU points over an
equivalent NMT system that does not use this technique. With 37.5 BLEU points,
our NMT system is the first to surpass the best result achieved on a WMT14
contest task.Comment: ACL 2015 camera-ready versio
- …